Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: add SimpleChannel::NotifyProducer function #314

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

romange
Copy link
Owner

@romange romange commented Sep 7, 2024

Currently SimpleChannel wakes up producer only when the queue has been depleted. This can be suboptimal in some situations, specifically in Dragonfly, consumer can pop an element and block on writing it into disk, an meanwhile producer can not progress. This function provides more advanced control, allowing to wake up producer earlier

@romange romange requested a review from chakaz September 7, 2024 14:52
@@ -81,6 +81,10 @@ template <typename T, typename Queue = folly::ProducerConsumerQueue<T>> class Si
return q_.capacity();
}

// Advanced control - tries to wake up the producer, blocked on a full queue.
void NotifyProducer() {
push_ec_.notify();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to support cancellation of the producer when it is stack in a loop trying to push to the queue but its slots are full. This will solve the bug we saw with debug populate and replication in df. The problem is not that the producer is sleeping, the problem is that the producer infinitely tries to push on a channel that's never empty and never drained. If we support cancellation, the producer will back off, skip the push and continue with its thing

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware of this bug, nor I understand how a producer can be stuck if a channel is not full. it should succeed with the push.

Copy link
Contributor

@kostasrim kostasrim Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking about this one. dragonflydb/dragonfly#3649

how a producer can be stuck if a channel is not full.

That's the bug. The producer is stuck trying to push because the channel is full and it's blocked. Then the replication is cancelled, the fiber that handles full sync drains the channel but before the cancellation process (another fiber) kicks in the channel gets filled again. Then cancellation fiber kicks in and tries to unregister from the journal but the fiber that is trying to write to the journal is blocked because the channel is full and we deadlock. What solves this cycle is a cancel flag. I even wrote it internally on google chat 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It get's worse when the replica has more flows (because it has more proactors than master) because then multiple threads push to the same channel which gets filled even quicker

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is unrelated to the bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants